Goto

Collaborating Authors

 dynamic game


Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

Neural Information Processing Systems

Recent extensions to dynamic games of the well known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the recent algorithms which are moreover, model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our novel procedures can be interpreted as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where players best responses are regularized, thanks to a smooth perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in the same classes of dynamic games (zero-sum and identical interests discounted stochastic games). The proof uses the continuous smooth best-response dynamics counterparts, and stochastic approximation methods.


What Do Agents Think One Another Want? Level-2 Inverse Games for Inferring Agents' Estimates of Others' Objectives

Khan, Hamzah I., Li, Jingqi, Fridovich-Keil, David

arXiv.org Artificial Intelligence

Effectively interpreting strategic interactions among multiple agents requires us to infer each agent's objective from limited information. Existing inverse game-theoretic approaches frame this challenge in terms of a "level-1" inference problem, in which we take the perspective of a third-party observer and assume that individual agents share complete knowledge of one another's objectives. However, this assumption breaks down in decentralized, real-world scenarios like urban driving and bargaining, in which agents may act based on conflicting views of one another's objectives. We demonstrate the necessity of inferring agents' different estimates of each other's objectives through empirical examples, and by theoretically characterizing the prediction error of level-1 inference on fictitious gameplay data from linear-quadratic games. To address this fundamental issue, we propose a framework for level-2 inference to address the question: "What does each agent believe about other agents' objectives?" We prove that the level-2 inference problem is non-convex even in benign settings like linear-quadratic games, and we develop an efficient gradient-based approach for identifying local solutions. Experiments on a synthetic urban driving example show that our approach uncovers nuanced misalignments that level-1 methods miss.


Smooth Games of Configuration in the Linear-Quadratic Setting

Milzman, Jesse, Mao, Jeffrey, Loianno, Giuseppe

arXiv.org Artificial Intelligence

Dynamic game theory offers a toolbox for formalizing and solving for both cooperative and non-cooperative strategies in multi-agent scenarios. However, the optimal configuration of such games remains largely unexplored. While there is existing literature on the parametrization of dynamic games, little research examines this parametrization from a strategic perspective where each agent's configuration choice is influenced by the decisions of others. In this work, we introduce the concept of a game of configuration, providing a framework for the strategic fine-tuning of differential games. We define a game of configuration as a two-stage game within the setting of finite-horizon, affine-quadratic, AQ, differential games. In the first stage, each player chooses their corresponding configuration parameter, which will impact their dynamics and costs in the second stage. We provide the subgame perfect solution concept and a method for computing first stage cost gradients over the configuration space. This then allows us to formulate a gradient-based method for searching for local solutions to the configuration game, as well as provide necessary conditions for equilibrium configurations over their downstream (second stage) trajectories. We conclude by demonstrating the effectiveness of our approach in example AQ systems, both zero-sum and general-sum.


Optimal Modified Feedback Strategies in LQ Games under Control Imperfections

Rabbani, Mahdis, Mojahed, Navid, Nazari, Shima

arXiv.org Artificial Intelligence

Game-theoretic approaches and Nash equilibrium have been widely applied across various engineering domains. However, practical challenges such as disturbances, delays, and actuator limitations can hinder the precise execution of Nash equilibrium strategies. This work explores the impact of such implementation imperfections on game trajectories and players' costs within the context of a two-player linear quadratic (LQ) nonzero-sum game. Specifically, we analyze how small deviations by one player affect the state and cost function of the other player. To address these deviations, we propose an adjusted control policy that not only mitigates adverse effects optimally but can also exploit the deviations to enhance performance. Rigorous mathematical analysis and proofs are presented, demonstrating through a representative example that the proposed policy modification achieves up to $61\%$ improvement compared to the unadjusted feedback policy and up to $0.59\%$ compared to the feedback Nash strategy.


Generalized Nash Equilibrium Solutions in Dynamic Games With Shared Constraints

Pustilnik, Mark, Borrelli, Francesco

arXiv.org Artificial Intelligence

In dynamic games with shared constraints, Generalized Nash Equilibria (GNE) are often computed using the normalized solution concept, which assumes identical Lagrange multipliers for shared constraints across all players. While widely used, this approach excludes other potentially valuable GNE. This paper presents a novel method based on the Mixed Complementarity Problem (MCP) formulation to compute non-normalized GNE, expanding the solution space. We also propose a systematic approach for selecting the optimal GNE based on predefined criteria, enhancing practical flexibility. Numerical examples illustrate the methods effectiveness, offering an alternative to traditional normalized solutions.


Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control

Lee, Taeho, Lee, Donghwan

arXiv.org Artificial Intelligence

Practical control systems pose significant challenges in identifying optimal control policies due to uncertainties in the system model and external disturbances. While $H_\infty$ control techniques are commonly used to design robust controllers that mitigate the effects of disturbances, these methods often require complex and computationally intensive calculations. To address this issue, this paper proposes a reinforcement learning algorithm called Robust Deterministic Policy Gradient (RDPG), which formulates the $H_\infty$ control problem as a two-player zero-sum dynamic game. In this formulation, one player (the user) aims to minimize the cost, while the other player (the adversary) seeks to maximize it. We then employ deterministic policy gradient (DPG) and its deep reinforcement learning counterpart to train a robust control policy with effective disturbance attenuation. In particular, for practical implementation, we introduce an algorithm called robust deep deterministic policy gradient (RDDPG), which employs a deep neural network architecture and integrates techniques from the twin-delayed deep deterministic policy gradient (TD3) to enhance stability and learning efficiency. To evaluate the proposed algorithm, we implement it on an unmanned aerial vehicle (UAV) tasked with following a predefined path in a disturbance-prone environment. The experimental results demonstrate that the proposed method outperforms other control approaches in terms of robustness against disturbances, enabling precise real-time tracking of moving targets even under severe disturbance conditions.


Ranking Joint Policies in Dynamic Games using Evolutionary Dynamics

Koliou, Natalia, Vouros, George

arXiv.org Artificial Intelligence

Game-theoretic solution concepts, such as the Nash equilibrium, have been key to finding stable joint actions in multi-player games. However, it has been shown that the dynamics of agents' interactions, even in simple two-player games with few strategies, are incapable of reaching Nash equilibria, exhibiting complex and unpredictable behavior. Instead, evolutionary approaches can describe the long-term persistence of strategies and filter out transient ones, accounting for the long-term dynamics of agents' interactions. Our goal is to identify agents' joint strategies that result in stable behavior, being resistant to changes, while also accounting for agents' payoffs, in dynamic games. Towards this goal, and building on previous results, this paper proposes transforming dynamic games into their empirical forms by considering agents' strategies instead of agents' actions, and applying the evolutionary methodology $\alpha$-Rank to evaluate and rank strategy profiles according to their long-term dynamics. This methodology not only allows us to identify joint strategies that are strong through agents' long-term interactions, but also provides a descriptive, transparent framework regarding the high ranking of these strategies. Experiments report on agents that aim to collaboratively solve a stochastic version of the graph coloring problem. We consider different styles of play as strategies to define the empirical game, and train policies realizing these strategies, using the DQN algorithm. Then we run simulations to generate the payoff matrix required by $\alpha$-Rank to rank joint strategies.


Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

Neural Information Processing Systems

Recent extensions to dynamic games of the well known fictitious play learning procedure in static games were proved to globally converge to stationary Nash equilibria in two important classes of dynamic games (zero-sum and identical-interest discounted stochastic games). However, those decentralized algorithms need the players to know exactly the model (the transition probabilities and their payoffs at every stage). To overcome these strong assumptions, our paper introduces regularizations of the recent algorithms which are moreover, model-free (players don't know the transitions and their payoffs are perturbed at every stage). Our novel procedures can be interpreted as extensions to stochastic games of the classical smooth fictitious play learning procedures in static games (where players best responses are regularized, thanks to a smooth perturbation of their payoff functions). We prove the convergence of our family of procedures to stationary regularized Nash equilibria in the same classes of dynamic games (zero-sum and identical interests discounted stochastic games).


Real-Time Algorithms for Game-Theoretic Motion Planning and Control in Autonomous Racing using Near-Potential Function

Kalaria, Dvij, Maheshwari, Chinmay, Sastry, Shankar

arXiv.org Artificial Intelligence

Autonomous racing extends beyond the challenge of controlling a racecar at its physical limits. Professional racers employ strategic maneuvers to outwit other competing opponents to secure victory. While modern control algorithms can achieve human-level performance by computing offline racing lines for single-car scenarios, research on real-time algorithms for multi-car autonomous racing is limited. To bridge this gap, we develop game-theoretic modeling framework that incorporates the competitive aspect of autonomous racing like overtaking and blocking through a novel policy parametrization, while operating the car at its limit. Furthermore, we propose an algorithmic approach to compute the (approximate) Nash equilibrium strategy, which represents the optimal approach in the presence of competing agents. Specifically, we introduce an algorithm inspired by recently introduced framework of dynamic near-potential function, enabling real-time computation of the Nash equilibrium. Our approach comprises two phases: offline and online. During the offline phase, we use simulated racing data to learn a near-potential function that approximates utility changes for agents. This function facilitates the online computation of approximate Nash equilibria by maximizing its value. We evaluate our method in a head-to-head 3-car racing scenario, demonstrating superior performance compared to several existing baselines.


Inferring Short-Sightedness in Dynamic Noncooperative Games

Armstrong, Cade, Park, Ryan, Liu, Xinjie, Gupta, Kushagra, Fridovich-Keil, David

arXiv.org Artificial Intelligence

Dynamic game theory is an increasingly popular tool for modeling multi-agent, e.g. human-robot, interactions. Game-theoretic models presume that each agent wishes to minimize a private cost function that depends on others' actions. These games typically evolve over a fixed time horizon, which specifies the degree to which all agents care about the distant future. In practical settings, however, decision-makers may vary in their degree of short-sightedness. We conjecture that quantifying and estimating each agent's short-sightedness from online data will enable safer and more efficient interactions with other agents. To this end, we frame this inference problem as an inverse dynamic game. We consider a specific parametrization of each agent's objective function that smoothly interpolates myopic and farsighted planning. Games of this form are readily transformed into parametric mixed complementarity problems; we exploit the directional differentiability of solutions to these problems with respect to their hidden parameters in order to solve for agents' short-sightedness. We conduct several experiments simulating human behavior at a real-world crosswalk. The results of these experiments clearly demonstrate that by explicitly inferring agents' short-sightedness, we can recover more accurate game-theoretic models, which ultimately allow us to make better predictions of agents' behavior. Specifically, our results show up to a 30% more accurate prediction of myopic behavior compared to the baseline.